archivebox 使用
(未完待续)
安装 archivebox
# create a folder to store your data (can be anywhere) mkdir -p ~/archivebox/data && cd ~/archivebox # download the compose file into the directory # curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml curl --proxy http://127.0.0.1:7890 -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml # (shortcut for getting https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/stable/docker-compose.yml) # initialize your collection and create an admin user for the Web UI (or set ADMIN_USERNAME/ADMIN_PASSWORD env vars) docker compose run archivebox init docker compose run archivebox manage createsuperuser
sonic 全文检索
# download the sonic config file into your data folder (e.g. ~/archivebox) # curl -fsSL 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/etc/sonic.cfg' > sonic.cfg curl --proxy http://127.0.0.1:7890 -fsSL 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/etc/sonic.cfg' > sonic.cfg # then uncomment the sonic-related sections in docker-compose.yml vi docker-compose.yml # to backfill any existing archive data into the search index, run: docker compose run archivebox update --index-only
docker compose up -d
此时 访问 ip:8000 可以浏览页面了
安装 chrome
抓取需要登录的内容,通过 cookie 设置
sudo apt update sudo apt install chromium-browser # or on some systems: sudo apt install chromium
修改docker-compose.yml
yml
services:
archivebox:
...
volumes:
...
- ./data/personas/Default:/data/personas/Default
environment:
- CHROME_USER_DATA_DIR=/data/personas/Default/chrome_profile
- DISPLAY=novnc:0.0
novnc:
image: theasp/novnc:latest
environment:
- DISPLAY_WIDTH=1920
- DISPLAY_HEIGHT=1080
- RUN_XTERM=no
ports:
- "8080:8080"
添加 CHROME_USER_DATA_DIR和 DISPLAY